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Abstract 



We use methods of random matrix theory to analyze the cross-correlation 



>^ \ matrix C of price changes of the largest 1000 US stocks for the 2-year pe- 

rn , 

OO ' riod 1994-95. We find that the statistics of most of the eigenvalues in the 

(N 
^^ \ spectrum of C agree with the predictions of random matrix theory, but there 

0^ . 

Q"^ \ are deviations for a few of the largest eigenvalues. We find that C has the 

-)— > I 

C^ \ universal properties of the Gaussian orthogonal ensemble of random matrices. 

, , Furthermore, we analyze the eigenvectors of C through their inverse partic- 

Q , ipation ratio and find eigenvectors with large inverse participation ratios at 

O ' 

both edges of the eigenvalue spectrum — a situation reminiscent of results in 

rS \ localization theory. 
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There has been much recent work applying physics concepts and methods to the study 
of financial time series p|-p!^. In particular, the study of correlations between price changes 



of different stocks is both of scientific interest and of practical relevance in quantifying the 
risk of a given stock portfolio |1],|2|. Consider, for example, the equal-time correlation of 
stock price changes for a given pair of companies. Since the market conditions may not be 
stationary, and the historical records are finite, it is not clear if a measured correlation of 
price changes of two stocks is just due to "noise" or genuinely arises from the interactions 
among the two companies. Moreover, unlike most physical systems, there is no "algorithm" 
to calculate the "interaction strength" between two companies (as there is for, say, two spins 
in a magnet). The problem is that although every pair of companies should interact either 
directly or indirectly, the precise nature of interaction is unknown. 

In some ways, the problem of interpreting the correlations between individual stock-price 
changes is reminiscent of the difficulties experienced by physicists in the fifties, in interpreting 
the spectra of complex nuclei. Large amounts of spectroscopic data on the energy levels were 
becoming available but were too complex to be explained by model calculations because the 
exact nature of the interactions were unknown. Random matrix theory (RMT) was devel- 
oped in this context, to deal with the statistics of energy levels of complex quantum systems 
||l5| , p!6[| . With the minimal assumption of a random Hamiltonian, given by a real symmetric 
matrix with independent random elements, a series of remarkable predictions were made 



and successfully tested on the spectra of complex nuclei |15]. RMT predictions represent 



an average over all possible interactions |jT6[. Deviations from the universal predictions of 



RMT identify system-specific, non-random properties of the system under consideration, 
providing clues about the nature of the underlying interactions |lT7| , p!8 . 



In this letter, we apply RMT methods to study the cross-correlations |T0| of stock price 
changes. First, we demonstrate the validity of the universal predictions of RMT for the 
eigenvalue statistics of the cross-correlation matrix. Second, we calculate the deviations 
of the empirical data from the RMT predictions, obtaining information that enables us to 
identify cross-correlations between stocks not explainable purely by randomness. 



We analyze a data base pO[ containing the price Si{t) of stock i at time t, where i = 
1, . . . , 1000 denotes the largest 1000 publicly-traded companies and the time t runs over the 
2- year period 1994-95. From this time series, we calculate the price change Gi{t, At), defined 
as 

G,{t,At) = \nS,{t + At)-\nS,{t), (1) 

where At = 30 min is the sampling time scale. The simplest measure of correlations between 

different stocks is the equal-time cross-correlation matrix C which has elements 

^ _ (GjGj) - {Gi){Gj) . 



where cxj = J (Gf) — {Gi^ is the standard deviation of the price changes of company i, and 
(■ ■ ■) denotes a time average over the period studied [0. 

We analyze the statistical properties of C by applying RMT techniques. First, we diag- 
onalize C and obtain its eigenvalues A^ — with k = 1, ■ ■ ■ , 1000 — which we rank-order from 
the smallest to the largest. Next, we calculate the eigenvalue distribution ||T0| and compare 
it with recent analytical results for a cross-correlation matrix generated from finite uncorre- 
lated time series [^. Figure |^ shows the eigenvalue distribution of C, which deviates from 
the predictions of Ref. |^, for large eigenvalues A^ > 1.94 (see caption of Fig. |T]). This 



result is in agreement with the results of Ref. |10| for the eigenvalue distribution of C on a 
daily time scale. 

To test for universal properties, we first calculate the distribution of the nearest-neighbor 
spacings s = Xk+i — A^. The nearest-neighbor spacing is computed after transforming the 
eigenvalues in such a way that their distribution becomes uniform — a procedure known 
as unfolding [p!7|-[T9|. Figure ^(a) shows the distribution of nearest-neighbor spacings for 



the empirical data, and compares it with the RMT predictions for real symmetric random 
matrices. This class of matrices shares universal properties with the ensemble of matrices 
whose elements are distributed according to a Gaussian probability measure — the Gaussian 
orthogonal ensemble (GOE). We find good agreement between the empirical data and the 
GOE prediction. 
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PGOE(s) = y exp(^--s^j . (3) 

A second independent test of the GOE is the distribution of next-nearest-neighbor spac- 



ings between the rank-ordered eigenvalues [O . This distribution is expected to be identical 



to the distribution of nearest-neighbor spacings of the Gaussian symplectic ensemble (GSE) 
as verified by the empirical data [Fig. |^(b)]. 

The distribution of eigenvalue spacings reflects correlations only of consecutive eigen- 
values but does not contain information about correlations of longer range. To probe any 
"long-range" correlations, we first calculate the number variance S^ which is defined as 
the variance of the number of unfolded eigenvalues in intervals of length L around each of 
the eigenvalues |[T7|- [T9| , p2| . If the eigenvalues are uncorrected, S^ ~ L. For the opposite 
case of a "rigid" eigenvalue spectrum, E^ is a constant. For the GOE case, we find the 
"intermediate" behavior S^ ~ InL, as predicted by RMT [Fig. 0(c)]. 

A second way to measure "long-range" correlations in the eigenvalues is through the 
spectral rigidity A, defined to be the least square deviation of the unfolded cumulative 
eigenvalue density from a fit to a straight line in an interval of length L []T^-|T^,p3] . For 



uncorrelated eigenvalues, A ~ L, whereas for the rigid case A is a constant. For the GOE 
case we find A ~ InL as predicted by RMT [Fig. Kd)]. 

Having demonstrated that the eigenvalue statistics of C satisfies the RMT predictions, 
we now proceed to analyze the eigenvectors of C. RMT predicts that the components of the 
normalized eigenvectors of a GOE matrix are distributed according to a Gaussian probability 
distribution with mean zero and variance one. In agreement with recent results |T^, we 



find that eigenvectors corresponding to most eigenvalues in the "bulk" (A^ < 2) follow this 
prediction. On the other hand, eigenvectors with eigenvalues outside the bulk (A^ > 2) show 
marked deviations from the Gaussian distribution. In particular, the vector corresponding 
to the largest eigenvalue Aiooo deviates significantly from the Gaussian distribution predicted 
by RMT. 

The component £ of a given eigenvector relates to the contribution of company i to 



that eigenvector. Hence, the distribution of the components contains information about the 
number of companies contributing to a specific eigenvector. In order to distinguish between 
one eigenvector with approximately equal components and another with a small number of 



large components we define the inverse participation ratio |I7| , pl | 



1000 



4^^M\ (4) 

e=i 

where Uke, £ = 1, . . . , 1000 are the components of eigenvector k. The physical meaning of Ik 

can be illustrated by two limiting cases: (i) a vector with identical components Uki = 1/yN 

has Ik = ^/N, whereas (ii) a vector with one component Uki = 1 and all the others zero 

has Jfc = 1. Therefore, Ik is related to the reciprocal of the number of vector components 

significantly different from zero. 

Figure ^ shows Ik for eigenvectors of a matrix generated from uncorrelated time series 

with a power law distribution of price changes 0. The average value of Ik is (/) ^ 3 x 10^'^ ~ 



1/A^ indicating that the vectors are extended p4|,E5| — i.e., almost all companies contribute to 



them. Fluctuations around this average value are confined to a narrow range. On the other 
hand, the empirical data show deviations of Ik from (J) for a few of the largest eigenvalues. 
These Ik values are approximately 4-5 times larger than (/) which suggests that there are 
groups of approximately 50 companies contributing to these eigenvectors. The corresponding 
eigenvalues are well outside the bulk, suggesting that these companies are correlated fl^ . 



Surprisingly, we also find that there are Ik values as large as 0.35 for vectors corresponding 



to the smallest eigenvalues Aj ~ 0.25 [^]. These deviations from the average are two orders 
of magnitude larger than (J), which suggests that the vectors are localized p^ , p5| — i.e., only 
a few companies contribute to them. The small values of the corresponding eigenvalues 
suggests that these companies are uncorrelated with each other. 

The presence of vectors with large Ik also arises in the theory of Anderson localization 



p7| . In the context of localization theory, one frequently finds "random band matrices" |2^ 
containing extended states with small Ik in the middle of the band, whereas edge states are 
localized and have large Ik- Our finding of localized states for small and large eigenvalues 



of the cross-correlation matrix C is reminiscent of Anderson localization and suggests that 
C may be a random band matrix 



In summary, we find that the most eigenvalues in the spectrum of the cross-correlation 
matrix of stock price changes agree surprisingly well with the universal predictions of random 
matrix theory. In particular, we find that C satisfies the universal properties of the Gaussian 
orthogonal ensemble of real symmetric random matrices. We find through the analysis of 
the inverse participation ratio of its eigenvectors that C may be a random band matrix, 
which may support the idea that a metric can be defined on the space of companies and 
that a distance can be defined between pairs of companies |2^. Hypothetically, the presence 



of localized states may allow us to draw conclusions about the "spatial dimension" of the 
set of stocks studied here and about the "range" of the correlations between the companies. 
We thank M. Barthelemy, N.V. Dohkolyan, X. Gabaix, U. Gerland, S. Havlin, R.N. 
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FIG. 1. The probability density of the eigenvalues of the normalized cross-correlation matrix C 



for the 1000 largest stocks in the TAQ database for the 2-year period 1994-95 ||2^. Recent analytical 



results [21 1 for cross-correlation matrices generated from uncorrelated time series predict a finite 
range of eigenvalues depending on the ratio R of the length of the time series to the dimension 
of the matrix jl^. In our case R = 6.448 corresponding to eigenvalues distributed in the interval 



0.37 < Afc < 1.94 |21|. However, the largest eigenvalue for the 2-year period (inset) is approximately 
30 times larger than the maximum eigenvalue predicted for uncorrelated time series. The inset 
also shows the largest eigenvalue for the cross-correlation matrix for 4 half-year periods — denoted 
A, B, C, D. The arrow in the inset corresponds to the largest eigenvalue for the entire 2-year 
period, Aiooo ~ 50. The distribution of eigenvector components for the large eigenvalues, well 
outside the bulk show significant deviations from the Gaussian prediction of RMT, which suggests 



"collective" behavior or correlations |18] between different companies. The largest eigenvalue would 



then correspond to the correlations within the entire market [10|. 
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FIG. 2. Comparison of the RMT predictions for the spacing distributions with results for em- 
pirical cross-correlation matrix . (a) Nearest-neighbor {nn) spacing distribution of the eigenvalues 



of C after unfolding. We use the Gaussian broadening procedure |19|. The eigenvalue distribution 
can be considered as a sum of delta functions about each eigenvalue, A^, each of which is then 
"broadened" by choosing a Gaussian distribution with standard deviation {Xk+a ~ ^A:-a)/2, where 



2a is the size of the window used for broadening [19|. Here, a = 15, the optimum value obtained 
from Fig. §(d). The solid line is the GOE prediction, Eq. (^), and the dashed line is a fit to the one 
parameter Brody distribution p{s) = B {1 + (3) s^ exp{—Bs^^^), with B = [T{^wti)]^^ ■ The fit 
yields /? = 0.99 it 0.02, in good agreement with the GOE prediction /3 = 1. A Kolmogorov-Smirnov 
test suggests that the GOE is 10^ times more likely to be the correct description than the Gaussian 
unitary ensemble, and lO^'^ times more likely than the GSE. Furthermore, at the 80% confidence 
level, the Kolmogorov-Smirnov test cannot reject the hypothesis that the GOE is the correct de- 
scription, (b) Next-nearest-neighbor (nnn) spacing distribution of C. RMT predicts that, for the 
GOE, the distribution of next-nearest-neighbor spacing should follow the same distribution as the 
nearest-neighbor spacing for the GSE. This prediction is confirmed for the empirical data both 
visually and by a Kolmogorov-Smirnov test that at the 40% confidence level cannot reject the 
hypothesis that the GSE is the correct distribution, (c) Number variance and (d) spectral rigidity 
of C for different values of the unfolding parameter a, as compared to the exact expression for 
the GOE (solid line) and the uncorrelated case (dashed line) . As a increases, both the number 
variance and the spectral rigidity approach the theoretical curve for the GOE while the spacing 
distribution remains essentially unchanged. We choose a = 15 as the optimal- value. 
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FIG. 3. Inverse participation ratio Ik for each of the 1000 eigenvectors. As a control, we 
show in the inset the Ik values for the eigenvectors of a cross-correlation matrix computed from 
uncorrelated independent power-law distributed time series |^] of the same length as the data. 
Empirical data show marked peaks at both edges of the spectrum, whereas the control shows only 
small fluctuations around the average value (/) = 3 x 10^'^. The large Ik values for the largest 
eigenvalues are to be expected from Fig. |^, but the large values of Ik for the small eigenvalues are 
surprising. Large Ik values at the edges of the eigenvalue spectrum is a situation often found in 
localization theory. 
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